Let's try to understand what each column represents

Visualization

It is difficult to understand various grading factors which are crucial for assigning a grade to the borrower. Let's try to visualize how each factor plays a vital role in determing the interest rate.

1. Location

The location of a given loan is another factor to consider when making investment decisions. Each local market is different and may affect how well the completed property will sell and how much it will fetch. Lower-risk projects tend to be located in areas with strong real estate markets.

Additionally, location also heavily influences the ease of foreclosure in the event it is necessary, which may be a factor worth considering for some investors. This is especially true in mortgage loan.

Surprisingly, the state with highest overall interest rate is Hawaii which actually has the lowest mortgage interest rate. Here, we are considering personal loan as well so that can be the reason why Hawaii has higher interest rate. Though we need to filter out these records based on type of loan, etc to gain more insight.

We have to refer state-specific laws and interest rate set by the Federal reserve to keep track of the interest rate accurately.

Here, we can depict which state has highest interest rate. Though interest rate depends on various factors, main creteria can be per capita income, poverty, taxes, tax systems of that state.

2. Grade Assigned

Loans can be assigned one of seven letter grades from A to G, and each grade generally reflects the overall risk of the loan. For example, Grade A loans generally have lower expected returns, lower expected loan losses, and corresponding lower interest payments; whereas on the other end of the spectrum, Grade G loans have higher expected returns, higher potential loan losses, but correspondingly higher interest rates. With Groundfloor, you create a custom portfolio of real estate investments based on your own investment criteria and risk tolerances.

Above we can depict that interest starts rising with the drop in associated grade with that loan. This relation in the graph seems accurate.

As interest rate is mostly just dependent on grade of the loan, we can start analying the relation between assigned grades and various variabls

Moreover, we also noticed that interest rate does not rely on any other varibles given in the dataset. In practice, the grade is assigned between A1 and G5 which means that our dataset does follow that format which is a good thing. Next, the grade is determined based on FICO score which we are not given. But assuming that Grades were calculated from FICO score itself, we can take this into consideration. credit score can be estimated with credit limit and credit used which gives us credit utilization score as portion of your FICO score is determined by credit utilization.

3. Credit Utilization

We can plot the visualization of credit utilization for further analysis. Credit utilization rate has proven to be extremely predictive of future repayment risk. So it is often an important factor in a person's score. Generally speaking, the higher your utilization rate is, the greater is the risk that you will default on a credit account within the next two years.

Here, we can depict that the credit utilization plays a vital role in determing the FICO score which is used to assign the grades to the borrower. As the credit utilization score starts rising, the grade assigned is much lower. By analyzing the above chart, It is safe to say that 0.3 credit utilization is holds true here on this dataset as well. Credit utilization above 0.3 can bring the grade down.

One outlier here is grade G4 which has low credit utilization. It is because there is only one record of having G4 sub grade and that is why we can count is an an outlier.

4. Loan Amount

Here, the loan amount also brings down your grade as it depends on your credit limit, etc and when we ask to borrow more money compared to our credit limit, the grade can go down. Important thing to note here is that the distribution is quite dispersed for each Grade. But for grade G, the total Loan amount is quite high which suggests that because the users had bad credit score and the total loan amount is higher, the chances of them having a lower grade is obvious.

Issues with the dataset

Assumptions

Future Work